We are IntechOpen, the world's leading publisher of Open Access books Built by scientists, for scientists

Open access books available 5,300

130,000 155M

International authors and editors

Downloads

Our authors are among the

most cited scientists 154 TOP 1%

Selection of our books indexed in the Book Citation Index in Web of Science™ Core Collection (BKCI)

# Interested in publishing with us? Contact book.department@intechopen.com

Numbers displayed above are based on latest data collected. For more information visit www.intechopen.com

# **New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology**

Gibon Yves 1,2, Rolin Dominique2,3, Deborde Catherine1,2 , Bernillon Stéphane1,2 and Moing Annick1,2 *1 INRA, UMR1332 Fruit Biology and Pathology, Centre INRA de Bordeaux, Villenave d'Ornon <sup>2</sup>Metabolome Facility of Bordeaux Functional Genomics Centre, Centre INRA de Bordeaux, Villenave d'Ornon <sup>3</sup>Université de Bordeaux, UMR1332 Fruit Biology and Pathology, Centre INRA de Bordeaux, Villenave d'Ornon France* 

# **1. Introduction**

Today's unsustainable use of fossil fuel reserves or green fuel is predicted to destabilize the global climate and lead to reduced food security. The key challenge for the coming decades are to meet local needs for food, in terms of both quantity and quality, while conserving natural resources and biodiversity (Ruane & Sonnino, 2011) and to develop a supply industry based on renewable plant-derived products. Indeed agricultural crops can be viewed as a source of or starting point for a plant based economy, potential input to a bio refinery in which all parts of the plant are processed and used to yield (i) food, both traditional and with enhanced nutritional safety, stability and processability; (ii) industrial products, including polymers, fibbers, industrial oils and packaging materials as well as basic chemical building blocks (green chemistry); (iii) fuels such as ethanol and biodiesel; (iv) molecules with pharmaceutical properties and health benefits. To reach these new agricultural perspectives, new varieties with the appropriate properties need to be selected (Tester & Langridge, 2010) through plant breeding, be it conventional, marker assisted, QTL mapping assisted, or genetically modified (GM) (Mittler & Blumwald, 2010). There are also growing demands for germplasm adapted to deal with changing climates and effective under a range of cultural practices and for foods with higher nutritional value. To decipher agronomical traits, functional genomics approaches can be of good use to understand physiological, molecular and genetic processes underlying complex traits. Appropriate functional genomics technologies such as transcriptomics, proteomics and metabolomics must be used together with detailed physiological and environmental information as a combined platform for 'candidate' gene identification or translational genomics approaches that aims to improve complex traits in plants (Sanchez et al., 2011). Without a comprehensive understanding of the plant physiology, molecular processes and genetics of

the components of complex traits, the development of new varieties will remain an empirical yet uncertain procedure. This integration of functional genomics data can be viewed as the first step to systems and predictive biology serving agricultural perspectives.

Among the 'omics' technologies, metabolomics is one of the more recently introduced. The term 'metabolome' coined in 1998 (Oliver et al., 1998) refers to the richly diverse population of small molecules present in biofluids, living cells or organisms. Overall, there are two approaches to analyse small molecules, and they differ in the number of compounds analysed, the level of structural information obtained, and their sensitivity. The most common approach, metabolite profiling, is the analysis of small numbers of known metabolites in specific compound classes (e.g. sugars, amino acids or phenolics). At the other extreme, metabolic fingerprinting detects many compounds but their structures are rarely identified. Today metabolomics methods typically allow measuring hundreds of compounds, with a small number being definitively identified, a larger number being identified as belonging to particular compound classes, and many remaining unidentified.

Over the past decade, metabolomics has gone from being just a simple concept to becoming a rapidly growing discipline with valuable outputs in plant biology (Hall, 2006; Saito & Matsuda, 2010; Hall, 2011a; Shepherd et al., 2011). Metabolomics has played a key role in basic plant biology and started having a potentially broad field of applications. Plants produce an astonishing wealth of metabolites estimated to figures ranging from 200,000 to 1,000,000 metabolites (Dixon & Strack, 2003; Saito & Matsuda, 2010). The first significant advances have been made in the area of analytical technology for metabolite identification in order to increase our capacity to simultaneously analyse a chemically diverse range of metabolites in complex mixtures. The metabolomics community has set up analytical platforms with complementary analytical technologies (Moing et al., 2011) after having realized that no single technology currently available (or likely in the close future) will be able to detect all compounds found in living cells. Today these analytical platforms provide a combination of multiple analytical techniques such as gas chromatography (GC), liquid chromatography (LC) or capillary electrophoresis (CE) coupled to mass spectrometry (MS), or nuclear magnetic resonance spectroscopy (NMR) and much more (Kim et al., 2011; Lei et al., 2011).

Considering metabolomics as a combination of knowledge and know-how in biochemistry, signal processing, data and metadata handling, and data mining, the challenge remains to perform in a cohesive and coordinated manner these multidisciplinary approaches to solve biological questions (Ferry-Dumazet et al., 2011; Hall, 2011b). Recently, plant biologists have used metabolomics approaches to understand fundamental plant processes (Leiss et al., 2010; Sulpice et al., 2010), to make a link between genotype and biochemical phenotype and to study plant responses to biotic or abiotic stresses by combining genomics and biochemical phenotyping capabilities (Redestig & Costa, 2011; Villiers et al., 2011). While full genome sequence annotations of the major crops have been published, many post-genomic studies using metabolomics approaches have tried to bridge the phenotype-genotype gap in order to link gene to function (Smith & Bluhm, 2011). Such integrated approaches have been helpful in assigning functions to a large class of function-unknown genes and their interactions with other pathways and also useful in applications such as metabolic engineering (Liu et al., 2009) and assessment of GM plants (Kusano et al., 2011b).

As part of a more recent emerging area, robust data generated from metabolomics can be combined with computationally-intensive approaches based on modelling of pathways to steer this field towards systems biology, which promises to provide an integrated view of cellular processes (Joyce & Palsson, 2006; Wang et al., 2006). Bringing metabolomics data into the forefront of system biology is a challenging opportunity that implies using quantitative metabolomics data in the context of models to improve our understanding of metabolism and drive the biological discovery process. So far, computational studies on metabolomics data have often been restricted to multivariate statistical analyses such as principal component analysis or PLS discriminant analysis to look at trends among different data sets. Such work has proven useful in discovering potential biomarkers of stress and identifying key metabolic difference in GM plants, but provides minimal insight into the underlying biology or the means to modulate it for agronomic or industrial purposes. Now researchers are rising to the challenge by using omics data integration and specially highthroughput metabolomics data within a constraint-based framework to address fundamental questions that would increase our understanding of systems as a whole.

This article provides an overview of the technological trends in plant metabolomics to optimize the characterization of a large number of metabolites with accurate and absolute quantification in a few samples (concept of vertical high-throughput metabolomics) and present the needed technologies to increase the analysis capacity of samples for large-scale studies (concept of horizontal high-throughput metabolomics). This article also outlines how these technological developments in plant metabolomics can be used for systems biology, quantitative genetics and the emerging field of meta-phenomics to answer the key challenges of plant biology and agriculture in the future, and which technological and computational developments are necessary to meet these challenges.

# **2. Technological trends in plant metabolomics**

For plant metabolomics, the analytical strategies reviewed a few years ago (Weckwerth, 2007) are still widely used. Major improvements over the past five years have targeted spectra resolution and processing (http://www.metabolomicssociety.org/software.html), and the emergence of databases (http://www.metabolomicssociety.org/database.html). Thanks to technological and methodological progress, numbers of analytes and compound families that can be determined in a given sample are still increasing, but usually at the expense of the number of samples that can be analysed due to increasing costs and/or labour (Fig. 1). Conversely, novel experimental strategies produce increasing numbers of samples. Thus, not only the best compromise between analyte number (vertical highthroughput approach) and sample throughput (horizontal high-throughput approach) has to be found, but also synergisms between such approaches.

# **2.1 Vertical approaches**

Vertical high-throughput approaches, also called high-density approaches, are defined as strategies that promote sample variables over sample numbers. They are especially interesting for studies in plants given their enormous metabolic diversity. In the plant kingdom, the species number is estimated between 270,000 (observed) and 400,000 and the number of metabolites produced between 200,000 and 1,000,000 (Dixon & Strack, 2003;

Saito & Matsuda, 2010). Even the number of primary metabolites, defined as the type of compounds synthesized by all or most plant species, may exceed the number of compounds found in other eukaryotes since plants are true autotrophs (Pichersky & Lewinsohn, 2011). In addition, different plant lineages synthesize distinct sets of "specialized metabolites", often mis-named "secondary metabolites" (Pichersky & Lewinsohn, 2011), with *Arabidopsis thaliana* estimated to make up to 3,500 of such specialized metabolites. Capturing such diversity is one of the challenges for plant metabolomics compared to animal metabolomics, which has to deal with 'only' 5,000 to 25,000 different metabolites (Trethewey, 2004). However, the consumption of plantderived food is known to lead to a strong increase in metabolite diversity in animal or human derived samples, e.g. blood or urine. This implies that plant and nutrition scientists face a similar challenge. Indeed, specific plant metabolites are attracting attention due to their role/impact on health and nutrition. Vertical metabolomics mainly relies on sophisticated instrumentation such as NMR and MS, with or without hyphenation of chromatography or capillary electrophoresis (LC-NMR, LC-SPE-NMR, LC-MS, GC- MS, GC- SPE-MS, CE-MS, Fourier Transform-MS (FT-MS), Table 1).

Fig. 1. Complementarities of high-throughput vertical and horizontal biochemical phenotyping. Costs and/or labour requirements are considered similar for each technology.

New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology 217


Table 1. A selection of examples of plant primary and specialized metabolites detected by a variety of analytical techniques. Adapted from (Kusano et al., 2011a). PDA, photodiode array detection; GABA, gamma-aminobutyrate; TCA, tricarboxylic acid cycle; SPE: solid-phase extraction. See (Saito & Matsuda, 2010) to have an overview of plant metabolomics pipelines.

Vertical approaches have to deal with a wide variety of chemical structures, which implies wide ranges of solubility, polarity and stability, as well as a high dynamic range of metabolite concentrations (>10<sup>12</sup> , (Sumner, 2010); 106, (Saito & Matsuda, 2010)). In addition, plant metabolites are usually extracted with sometimes sophisticated protocols including steps like heating or fractionation that may lead to losing or modifying metabolites, but also promote the synthesis or import of chemical artefacts. This is why the term analyte, which might be a metabolite or an artefact, is preferred. For example, during the derivatization process, which is required for non-volatile compounds when performing GC-MS, a single metabolite may produce multiple derivatives leading to different peaks. Similarly, adducts and product ions are formed during the desolvation step following the ionization process in LC-MS analyses (Werner et al., 2008). To cover the wide range of chemical diversity and concentrations of plant metabolites, careful experimental design is definitely required, including special care for harvest (Ernst, 1995), several extraction protocols and multianalytical platforms (see (Ryan & Robards, 2006; Allwood et al., 2011) and Tables 1-2).


Table 2. Representative examples of model-, crop- and medicinal-plant metabolomics studies using different analytical platforms. DI : Direct Infusion. FID, Flame Ionization Detection. HRMAS-NMR: High resolution Magic-Angle Spinning NMR. ICR: Ion Cyclotron Resonance.

Currently the number of quantified analytes in a given sample and in one shot is approximately 50 with proton NMR, 100-200 with GC-MS, >1000 with LC-High-Resolution-MS (LC-HR-MS). This expansion of scale has been made possible through improved analytical capabilities, dissemination of routine procedures between laboratories, but also implementation of dedicated statistical and data mining strategies. However, a large proportion of the analytes detected in plant extracts cannot be annotated and identified based on chemical shift and multiplicity for NMR analysis, or on elemental formula (based on m/z ratio and isotopic ratio) and chromatographic retention time for GC- or LC-MS analysis, alone. Hence metabolite identification, which uses a variety of analytical techniques along with analyte/metabolite databases, remains difficult (Moco et al., 2007). Achieving standardization for naming compounds at the plant metabolomics community level is also an important issue, as it will enable researchers to share knowledge and speed up metabolite identification (Saito & Matsuda, 2010; Kim et al., 2011). Another challenge is the development of chimiotheques, where trusted reference compounds would be available for the community to validate analyte identifications, for example via spiking experiments.

# **2.1.1 Optimization and combination of the different current techniques**

As already mentioned (see Tables 1-2), a combination of different current techniques is needed to cover the wide diversity of metabolites found in plants. Thus, a combination of different MS technologies is helpful for identification purpose. Among HR-MS technologies, LC- Time-Of-Flight (TOF) (resolution of 8,000-20,000, accuracy 1-5 ppm), Orbitrap® (resolution of 100,000, accuracy< 1 ppm) and FT-ICR-MS (resolution> 100,000, accuracy< 1 ppm) are currently the most powerful ultra-high resolution (UHR) mass spectrometers. They provide molecular formula information, thus offering great possibilities in terms of metabolite identification (see (Werner et al., 2008) for the strategy, pitfalls and bottleneck of metabolite identification). Nevertheless, the poor reproducibility and fragmentation variability between instruments from the same brand require a home-made metabolite database for each instrument. In addition it should be kept in mind that plant extracts contain many isomers, i.e. with identical elemental compositions and accurate masses. UHR-MS analysis of a selection of extracts may help to identify marker-metabolites revealed using HR-MS on a larger range of samples. Besides, multidimensional separation techniques have emerged in order to enhance metabolite coverage in the GC-MS (Gaquerel et al., 2009; Allwood et al., 2011) and LC-MS (Lei et al., 2011) fields. Further methodologies such as Ion Mobility MS (Dwivedi et al., 2008), which have not been tested in plants so far, might also prove useful. Anyway, processing and integrating data still remain the major bottlenecks and thus the most labour intensive steps for all these different analytical platforms. Developments are nevertheless underway to automate them (see (Redestig et al., 2010) for MS and hyphenated technologies).

# **2.1.2 From relative to absolute quantification of biological variability**

Whereas the "convenient" relative quantification is often used in MS studies, absolute quantification will be increasingly required. For example, various modelling approaches require precise concentrations of metabolites. Furthermore, the sharing and integration of data obtained on different analytical platforms will be greatly facilitated if expressed as absolute quantities. To face the challenge of quantification, in GC-MS and LC-ESI-MS impaired by ion suppression or enhancement and matrix effects, a solution is to use stable-isotopomers of target metabolites or to run whole <sup>13</sup>C metabolome isotope labelling (Feldberg et al., 2009; Giavalisco et al., 2009). However, even with a stable isotope the matrix effects may impair the quantification (Jemal et al., 2003) and few isotopically-labelled metabolites are currently commercially available (Lei et al., 2011). In contrast to MS-based technologies, NMR, although less sensitive, provides ease of quantitation since the resonance intensity is only determined by the molar concentration, and high reproducibility (Ward et al., 2010; Kim et al., 2011).

Surprinsingly, a unique extraction protocol (sometimes one-step protocol) is typically used for a given analytical technique, regardless of the vast variety of plant matrices (plant species, organs and tissues). Very few metabolomic publications are prolix on extraction recovery and stability. Running blanks (solvent blank and extraction blank) in the same conditions as the biological samples is also important, as it is needed to identify impurities originating from solvents (Kaiser et al., 2009) or consumables (i.e., phthalates from plastic ware) (Allwood et al., 2011; Weckwerth, 2011). Although metabolomics is by definition an untargeted approach, highly selective extraction protocols along with targeted analysis should not be forgotten, especially to reach high and reproducible extraction recovery as well as quantification accuracy (Sawada et al., 2009). Then, replication is required to achieve statistical reliability. Biological replicates should be preferred to technological replicates assuming biological variance almost always exceeds analytical variance (Shintu et al., 2009). Five biological replicates of five pooled-tissue samples or of five individuals and two to three technological replicates are recommended in plant metabolomics to get statistically reliable information (Tikunov et al., 2007). Quality control samples should also be run (Fiehn et al., 2008; Allwood et al., 2011).

#### **2.2 Horizontal approaches**

Horizontal approaches are defined as strategies that promote sample number over number of variables being measured. Mutant screens and quantitative genetics are typical examples requiring horizontal high-throughput, as they typically involve experiments with hundreds to thousands of samples. Targeted assays are usually preferred due to their low needs in terms of labour and/or costs, although several untargeted strategies such as bucketing and fingerprinting are also amenable to very high numbers of samples. While the processing of raw data still represents the slowest step in vertical strategies, the toughest bottleneck in horizontal high-throughput approaches is probably sample logistics.

### **2.2.1 Sample logistics**

In large scale experiments, harvesting, grinding and weighing become extremely work intensive (>75% of the time), especially when samples need to be kept at very low temperatures to avoid alteration of their biochemical composition. Due to the highly dynamic nature of the metabolome, harvesting and quenching of samples into liquid nitrogen should also be achieved as quickly as possible (Ap Rees et al., 1977). Unfortunately, fast solutions are very limited (e.g. leaf punchers), and thus recruiting as many as possible helpers probably remains the best way to achieve a reliable large-scale harvest. Sample storage may also become problematic when sample turnover dramatically increases. A good way to avoid losses of samples and overfilling of -80°C freezers is to use software enabling sample management. Although costly, available automated storage solutions may also dramatically improve the handling of samples.

Most analytical technologies require sample grinding prior to extraction and analysis. Mills enabling the parallel grinding of large numbers of samples (e.g., 192 samples) are now available at affordable prices. However, they usually do not allow multiparallel grinding of samples of large size, suggesting that further developments are needed to enable large-scale studies with organs such as fruits or ears and with most crops. Last but not least, the weighing of aliquots is a tedious task, especially when the material needs to be kept at very low temperature. A robot combining grinding and weighing of up to 96 samples has been developed recently (http://www.labman.co.uk), opening the way for unprecedented horizontal high-throughput.

### **2.2.2 Microplate technology**

The first microplate was fabricated in 1951 by the Hungarian Gyula Takátsky (Takatsy, 1955). It was made of 72 wells machined in a polymethyl methacrylate block and was used to speed up serial dilutions. This invention was driven by the need for a fast and reliable diagnostic for influenza, as Hungary was facing a major epidemic at that time. Sixty years later, the microplate format has driven the development of a huge diversity of labware and equipment, and hundreds of millions of microplates are sold every year. Sample storage, extractions and dilutions can be achieved in microplates, which remain the fastest and cheapest solution to process large numbers of samples in parallel. The quantification of various metabolites can be achieved in microplates via chemical or enzymatic reactions yielding products that can be quantified in a wide range of dedicated readers. The most common and cheapest readers are filter-based UV-visible spectrophotometers. They enable the quantification of a wide range of metabolites, including major sugars, organic acids and amino acids using endpoint methods (Bergmeyer, 1983, 1985, 1987), and metabolic intermediates that are present at much lower concentrations using kinetic assays (Gibon et al., 2002). Fluorimetry (Hausler et al., 2000) and luminometry (Roda et al., 2004) also provide high sensitivity and benefit from many commercially available fluorigenic substrates. Their use is nevertheless restricted in plants, due to the quenching of the emitted light that occurs in the presence of e.g. polyphenols that are usually present in plant extracts.

Throughput on microplates can be dramatically increased by using pipetting robots, which can handle up to 1536 samples in parallel and down to the nanoliter scale, depending on the brand. Thus, using one 96-head robot equipped with microplate handling and a series of microplate readers, a single person can run the determination of a given metabolite in thousands of samples per week. Increasing the number of analytes would nevertheless result in a decrease in sample throughput, roughly by a factor 2 at each supplemental analyte. It is estimated that at equal costs, such an approach might be of advantage for 10 to 20 analytes over other targeted technologies such as LC-MS/MS, which have already proven efficient for the capture of relatively large numbers of metabolites from the same class at high-throughput (Rashed et al., 1997) and 2.2.3 Section below. Microplates of increasing density formats (up to 9600 wells per plate) have been released to increase the overall throughput of analyses and decrease the costs per assay. Such miniaturisation nevertheless faces physical constraints of delivering very small volumes to wells and of detecting responses in a manner that is both sensitive and rapid (Battersby & Trau, 2002). The use of volumes in the nanoliter range is also limited by quick evaporation of the solvent used for analysis. A further drawback is high costs in terms of equipment (e.g., pipetting robots and readers able to handle high density plates), which implies that very high numbers of samples will have to be processed before decreasing costs per assay. These limitations probably explain why the use of high density microplates has not been adopted by a wide research community so far.

# **2.2.3 Targeted MS technologies: Quantification of selected biochemical markers using LC-MS**

Targeted analysis for small molecules using MS may use different technologies: GC-MS (Koek et al., 2011) , CE-MS (Ramautar et al., 2011), LC-MS and more recently MALDI-MS (Shroff et al., 2009). Here only LC-MS will be dealt with. LC-MS technology has been used for quantification long before the ages of the "omics". Despite its high-skilled technical need and its expensive cost, it has gained popularity in the metabolomics field. Triple quadrupoles analyzers (TQMS) are the workhorse of LC-MS quantification. They are mostly operated in multiple reaction monitoring (MRM) mode to achieve high selectivity and sensitivity. A new promising approach is the use of high resolution extracted ion chromatograms from full scans of high resolution instruments (Lu et al., 2008). Main advantages over MRM are the virtually unlimited number of monitored compounds and the possibility to reanalyze data after acquisition by extracting ion chromatograms corresponding to new compounds of interest.

Calibration of these methods involves most of the time internal calibration, with or without use of stable isotope analogs (Ciccimaro & Blair, 2010). For instance, quantification of amino acids by LC-MS (MRM) in barley was calibrated using d2-Phe as an internal standard. The interday precision of the method ranged from 3.7 to 9.4 % RSD, depending on the aminoacid (Thiele et al., 2008). However an isotopic dilution calibration is not always possible due to the lack of the corresponding labelled metabolite or its cost. These targeted LC-MS methods must undergo a complete method validation. They need fast separation, high selectivity, linearity range and limits of quantification in agreement with the metabolite level. For methods involving atmospheric pressure ionization, a careful evaluation of matrix effect on quantification and its minimization should be addressed (Trufelli et al., 2011). Moreover, to be relevant, these methods must obviously be applied after an exhaustive extraction evaluated by recovery procedures.

Targeted approaches have been applied in functional genomics. A "widely targeted" metabolomics approach based on LC-MS (MRM) has been proposed (Sawada et al., 2009). It consisted in repeated UPLC-TQMS analyses performed on a same sample. Each 3 min analytical run allowed simultaneous detection of 5 compounds. Expected throughput was estimated 1,000 biological samples per week for quantification of about 500 metabolites. This methodology was later applied on mature seeds of 2656 mutants and 225 *Arabidopsis* accessions for 17 amino-acids, 18 glucosinolate derivatives and one flavonoid, leading to characterization of amino-acids hyper-accumulating genotypes (Hirai et al., 2010). They have also been applied in phytochemistry and phytomedicine. For instance, Chinese medicinal herbs were tested for two secondary metabolites inducing nephrotoxicity. The UPLC-MS (MRM) run was 5 min and was amenable for high-throughput analyses (Jacob et al., 2007) This method was however impaired by a strong matrix effect that could not be prevented and another approach was preferred. At last, these techniques have been also shown to be a must for some classes of compounds such as hormones (Kojima et al., 2009), intermediates of central metabolism (Arrivault et al., 2009) and pesticides (Kmellar et al., 2010). In fact, they provide together the appropriate selectivity, sensitivity and throughput.

## **2.2.4 Further technologies**

Other technologies involve miniaturization of the separation step used prior to detection. A key step in miniaturization and automation of chromatography is the development of microfluidic systems, which process or manipulate very small volumes (down to 10-18 L) using channels of micrometre dimensions (Whitesides, 2006). The fact that factors such as surface tension and viscosity are getting very different in such systems brings many new possibilities to control concentrations and behaviours of molecules, particles or even cells (Nagrath et al., 2007) in space and time. Thus, the performance of soft lithography on e.g. poly(dimethylsiloxane) (McDonald et al., 2000) or polypropylene (Vengasandra et al., 2010) enables the design of reservoirs, channels, valves, and reaction chambers that can be used to separate and transform a wide range of molecules. Combined to detection systems such as laser induced fluorescence (Jiang et al., 2000), infrared spectroscopy (Shaw et al., 2009) or electrochemical electrodes (Eklund et al., 2006), they are well suited for massively parallel assays and provide the advantage of using very small amounts of reagents and samples. Further advantages are high resolution and sensitivity as well as fast analysis.

The use of microfluidic systems for metabolite analysis has just begun. Whereas applications targeting one molecule, e.g. glucose (Atalay et al., 2009), have been developed, the possibility to separate molecules has already enabled the profiling of classes of metabolites such as glucosinolates (Fouad et al., 2008) or flavonoids (Hompesch et al., 2005). Furthermore, the ease of creating systems able to distribute fluids into multiple channels enables the performance of several assays in parallel (Moser et al., 2002), or even n-multidimensional separations (Tomas et al., 2008) that would eventually be coupled to various detection devices, opening unprecedented possibilities for targeted and untargeted metabolomics.

Unfortunately, microfluidics have not yet benefited from standardisation, which hampers their adoption by a wide research community. Besides involving complex designs and fabrication techniques prohibiting widespread use due to cost and/or time for production, microfluidic systems may require unfamiliar laboratory habits. Therefore, one logical next step is the integration with the standardized microplate layout, thus taking advantage of the extensive work of the lab automation community (Choi & Cunningham, 2007; Halpin & Spence, 2010). Strikingly, such integration might ultimately result in methodologies enabling high density analyses on very large numbers of samples, thus breaking the relationship depicted in Figure 1. Finally, microfluidics and more generally nanotechnologies have almost certainly much more to offer, as future developments could for example lead to portable systems that would allow metabolite profiling directly in the field, thus shortcutting sample handling, or even to chips embarked on growing plants that would be able to monitor fluxes *in situ*.

### **2.3 Complementarities of vertical and horizontal approaches**

The combination of horizontal and vertical high-throughput approaches (Fig. 1) is of particular interest, as it has the potential to dramatically speed up the process of discovery. Thus, depending on the objectives of the study, an untargeted approach can be used first on a selection of samples to identify the most discriminating biomarkers that would then be analyzed on a much greater number of samples using a targeted approach (Tarpley et al., 2005). For example, such strategy has been successfully used in maize where a number of enzymes were first profiled in a small panel of eight highly diverse maize inbred lines, revealing a highly heritable variation in NAD-dependent isocitrate dehydrogenase activity. The use of a panel of about hundred lines then allowed the identification of a novel aminoacid substitution in a phylogenetically conserved site, which is assoaciated with isocitrate activity variation (Zhang et al., 2010). On the contrary, a horizontal approach can be used to screen large numbers of samples, thus revealing the most extremes or representative ones, on which a vertical approach can then be used to search for unexpected modifications, to study the system as a whole in the best possible matrix of samples, or simply to find novel biomarkers. As an example, the easy to measure glucose-6-phosphate, which is a good temporal marker of carbon depletion (Stitt et al., 2007), has been used to define a precise time frame to study transcriptomic and metabolomic responses to carbon starvation in Arabidopsis leaves (Usadel et al., 2008), thus avoiding unnecessary and costly analyses.

# **3. Key challenges for plant metabolomics**

An increasing number of approaches benefit from plant metabolomics. Among them, systems biology, quantitative genetics and meta-phenomics offer particularly exciting yet challenging perspectives.

#### **3.1 Systems biology**

In the context of plant functional genomics, the combination of metabolomics, proteomics, and transcriptomics has permitted to decipher and understand dynamic interactions in metabolic networks and to discover new correlations with biochemically characterized pathways as well as pathways hitherto unknown (Zhang et al., 2009; Williams et al., 2010). The main lesson from the latter or similar studies is that metabolic pathways are highly interactive rather than operating as separate units. In each biological system (cell, tissue, organism) there are metabolic networks in place, which are highly flexible and present a huge capacity to provide compensatory mechanisms through regulatory process. These observations actually explain why many dedicated GMO strategies ended up with silent phenotype (Weckwerth et al., 2004) and also convinced a large set of researchers to study metabolic networks as a whole, and not as a sum of parts, moving from reductionist to holistic approaches.

Although systems biology may mean different things to different people, there is a common understanding that this discipline is a comprehensive quantitative analysis of the manner in which all the components of a biological system (cell, tissue, organisms, communities) interact functionally over time. Systems biology aims at combining omics data resulting from complex networks into computational models. Besides integration with upstream levels (genome, transcriptome, proteome), metabolite data also have to be integrated with downstream levels (e.g. growth, performance) data. The quantitative data are the initial point for the formulation of mathematical models, which are refined by hypothesis-driven, iterative systems perturbations and data integrations. Cycles of iteration result in a more accurate model and ultimately the model explains emergent properties of the biological

system of interest. Once the model is sufficiently accurate and detailed, it allows biologists to accomplish two tasks (1) predict the behavior of the system given any perturbation such as a modification of the environment, and (2) redesign or perturb the gene regulatory network to create completely new emergent systems properties (Vidal, 2009; Westerhoff et al., 2009; Arkin & Schaffer, 2011).

Exciting examples of integrated system biology to solve biological questions in plant science have been published such as identification of key players in the branched amino acid metabolism in *A. thaliana* (Curien et al., 2009), analysis of carbohydrate dynamics during acclimation to low temperature in *A. thaliana* (Nagele et al., 2011)*,* or understanding the metabolism of tobacco grown on media containing different cytokins (Lexa et al., 2003). Systems biology will benefit from close collaborations between different teams covering complementary sectors of metabolism, e.g. central metabolism and different sectors of secondary metabolism. The challenges in establishing such systems approaches rely on collecting reliable, quantitative and systemic "omics" data, including metabolomics data, for developing modelling able to predict de novo biological outcomes given the list of the components involved. Advances in plant genome sequencing, transcriptomics and proteomics have paved the way for a systematic analysis of cellular processes at gene and protein levels. For metabolomics, some limitations remain for real system biology approaches, in terms of analytical sensitivity, throughput and access to specific tissue or subcellular compartments. Moreover, the high turn-over rate of many metabolic intermediates has to be taken into consideration. In addition, the absolute quantification of metabolites under physiological, *in vivo* and dynamic conditions remains a major challenge. The combination of existing multiparallel analytical platforms with special attention to metabolite quantification (see Sections 2.1.2 and 2.2.3) in a cohesive manner may not be sufficient and emerging microtechnologies such as microfluidics will certainly help (see Section 2.2.4 and (Wurm et al., 2010)).

Recently, plant systems biology has been redefined from cell to ecosystem (Keurentjes et al., 2011). For these authors, in a holistic systems-biology approach, plants have to be studied at six levels of biological organization (from subcellular level to ecosystem) in an orchestrated way, with special attention to the interdependence between the various levels of biological organization. The corresponding challenge will be to generate accurate experimental data for communities, populations, single whole plants, down to cell types and their organelles that can be used to feed new modelling concepts. For example, at the subcellular level molecular signaling pathways are crucial to understand cell development, defense against pathogens and many more intermediate processes in plants. The highly sensitive and highthroughput method developed for the simultaneous analysis of 43 molecular species of cytokinins, auxins, ABA and gibberellins (Kojima et al., 2009) has opened a big opportunity to routinely describe basic molecular signaling pathways in plant cells. Others challenges need to be considered in terms of dry labs. Because systems biology heavily relies on information stored in public databases for the different levels of biological organization, which is often incomplete, not standardized or improperly annotated, it is essential that collective efforts are developed for the validation of large data sets. Plant network biology is in its infancy and other current needs range from the development of new theoretical methods to characterize network topology, to insights into dynamics of motif clusters and biological function.

## **3.2 Quantitative genetics**

Quantitative genetics, which aim at associating quantitative traits with genomic regions called quantitative trait loci (QTL), represent a great opportunity to understand the diversity of plant metabolism and its relationship to nutritional value or biomass production. Studies combining metabolomics and quantitative genetics performed in Arabidopsis seedlings (Keurentjes et al., 2006) and tomato fruits (Schauer et al., 2006) have shown that variations in metabolite levels are for a large part heritable, and have identified large numbers of metabolite QTL, implying that levels of metabolites of interest could be controlled by manipulating small genome regions (Saito & Matsuda, 2010). Conversely, genetic diversity has been used to study the behaviour of metabolic networks and the way they integrate with whole plant traits, eventually revealing links between metabolic composition and growth (Meyer et al., 2007). Such findings appeal for multivariate QTL mapping (Calinski et al., 2000), thus opening exciting perspectives for the manipulation of plant performance.

The identification of the molecular bases underlying QTL has usually been a major challenge, and several years of hard work were typically necessary to unravel just one of them. However, thanks to the development of increasingly powerful methodologies exploiting genetic diversity that combine linkage and/or association mapping and high density genotyping, the elucidation of such molecular bases can now be achieved much quicker (Myles et al., 2009). The other side of the coin is that these methodologies require experiments of increasing sizes. Thus, the nested association mapping (NAM) approach recently developed in maize (Yu et al., 2008) already involves 5,000 genotypes (25 mapping populations of 200 genotypes each), which would represent at least 5,000 samples to process. Unfortunately, due to technical and financial limitations, the processing of so many samples remains very unusual in plant metabolomics. Furthermore, taking into account different growth scenarios, temporal aspects or different organs or tissues would result in factorial increases in numbers of samples. As mentioned above, combinations of horizontal and vertical metabolomics might nevertheless be very useful to decrease costs and labour. For instance, small sub-panels with high genetic diversity can be used first to assess heritability for a large number of metabolic traits, selected ones being then evaluated in full panels using inexpensive and fast methods.

Finally and importantly, genetic divergence and phenotypic divergence are too different things (Kozak et al., 2011). Accordingly, one single gene can be responsible for huge phenotypic variations and one single trait can be controlled by many QTL. One consequence is that molecular marker-assisted breeding might not always be the best and/or cheapest solution to select genotypes yielding phenotypes of interest. Therefore, it is pertinent to explore the possibility to use alternative biomarkers, including metabolites that can be measured at reasonable costs in very large populations.

### **3.3 Meta-phenomics**

Comparing different species is a powerful way to extend knowledge about biological processes. Thus, comparative genomics facilitate the assignation of gene function in non sequenced organisms, enable the quick annotation of newly sequenced genomes and greatly contribute to studies of gene function and evolution. For example, extensive synteny between genomes of Graminae species has been shown (Salse, 2004) and QTL controlling similar traits have been found in orthologous regions of e.g., maize and sorghum (Figueiredo et al., 2010). Conversely, the fact that orthologous genes do not necessary have the same functions in different species (Buckler et al., 2009) opens fascinating perspectives regarding evolution of gene function (Wang et al., 2009).

Finding common and divergent phenotypes among large numbers of species is also a promising way to better understand biological functions in the context of evolution. Meta-phenomics, which has recently been proposed by Poorter and colleagues (Poorter et al., 2009; Poorter et al., 2010), defines as the study of plant responses to environmental factors by performing meta-analyses. This novel ecophysiological approach aims at generalising plant responses by integrating phenotypic and environmental data gathered for large numbers of species. Thus, by using accurate normalisation procedures generic response curves were found for surface leaf area as related to major abiotic factors. Noteworthy, data for >300 species had to be collected and curated manually throughout 60 years of literature. One exciting finding is divergences between groups of species could be pinpointed, for example C3 and C4 species. There is no doubt that meta-phenomics is amenable to the cellular level, and in particular to metabolic pathways, and C3 and C4 metabotypes are indeed easy to distinguish when comparing their respective metabolomes. However, this might be considerably complicated given the heterogeneity of available metabolic data (in terms of e.g., annotation and normalisation). Furthermore, descriptions of environmental conditions found in literature are almost always text-based, and thus very difficult to compute. Fortunately, the adoption and use of standardised conceptualisations with explicit specifications to report data and metadata (i.e. minimum checklists) is progressing in the field of metabolomics (Fiehn et al., 2007a; Fiehn et al., 2007b). It will nevertheless be of central importance to prefer absolute quantification and to enable quantitative descriptions of environmental factors, which will probably be facilitated via collaborations with ecophysiologists.

# **4. Conclusion**

As metabolomics in general (Hall et al., 2011), plant metabolomics is moving towards biology with a growing variety of applications from 'simple' diagnostic of culture practices to translational studies towards systems biology. However, for some of the emerging applications, the optimization of analytical and computational technologies for the acquisition, handling and mining of metabolomics data remains necessary. Some of the crucial bottlenecks that still have to be adressed concern quantification for modelling, time and spatial resolved experiments, multi-experiments and data sharing.

The promotion of multi-experiments and multi-labs combined analyses (Allwood et al., 2009; Ward et al., 2010) for high sample numbers, indispensable for some ecology or quantitative genetics studies for instance, requires shared plant biological standards (labeled or non-labeled) and standardization of their use. The absolute quantification data, needed for metabolism modelling in systems biololy approaches, also requires isotopically labelled plant standards or at least labelled reference compounds for MS approaches. The generalisation of time-resolved experiments for instance for the study of fine metabolism regulation or short-term responses to stresses will need further increases in horizontal highthroughput using microplate, microfluidics or other technologies. Besides increased throughput, increased sensitivity for all the analytical technologies listed in this review may

open new insights into the use of metabolomics for plant development studies. Spatialresolved experiments with analysis of laser-microdissected samples by NMR or MS (Moco et al., 2009; Kim et al., 2011) will be particularly useful for the study of plant-pathogen interactions. The generalization of metabolite compartmentation studies in plant tissues at the cellular and subcellular levels, possibly with non-aqueous fractionation (Krueger et al., 2011), will also request increases in both horizontal high-throughput and sensitivity.

Moreover, the systematic sharing, combining, and re-exploring of the data produced using targeted metabolic phenotyping or untargeted metabolomics will produce new knowledge. Cataloging the metabolome itself by experimental data and literature data, stored in curated databases can complement genomic reconstructions of metabolism (Fiehn et al., 2011). Access to the regulation of the plasticity and flexibility of metabolic networks implies that the metadata of each experiment, including environment metadata (Hannemann et al., 2009) have to be carefully documented and uploaded into a central or distributed network repository dedicated to plants. This suggests that the MSI initiative (Fiehn et al., 2007a) has to continue to propose and promote standardization criteria that will be integrated by the bioinformatics developments of open repositories and used by the community. In addition, sophisticated but easy-to-use tools for metabolomics data combining, integration with other phenotyping or omics data, and integrated statistical analyses and modelling are needed. The plant metabolome community may benefit from more interaction with the human metabolome community for the use and development of such tools, and both may address combined analyses of food quality determinants (Hall et al., 2008) and food human consumption monitoring (Wishart, 2008).

# **5. Acknowledgment**

 Financial supports of ERA-NET ERASysBio+ (FRIM) and FP7 KBBE (DROPS, grant agreement number FP7-244374) are acknowledged. All authors acknowledge support from the Metabolome Facility of Bordeaux Functional Genomics Centre.

# **6. References**



protein and metabolite data to study lignin biosynthesis in hybrid aspen. *Journal of Proteome Research,* Vol.8, No. 1, (Jan 2009), pp. 199-210, issn 1535-3893


New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology 231

phenylpropanoid and isoflavonoid biosynthesis in *Medicago truncatula* cell cultures. *Plant Physiology,* Vol.146, No. 2, (Feb 2008), pp. 387-402, issn 0032-0889



demonstration of unacceptable matrix effect in spite of use of a stable isotope analog internal standard. *Rapid Communications in Mass Spectrometry,* Vol.17, No. 15, (Jun 2003), pp. 1723-1734, issn 0951-4198


compartmentalized *Arabidopsis thaliana* leaf metabolome. *Plos One,* Vol.6, No. 3, (Mar 2011), pp. 16, issn 1932-6203








**Metabolomics** Edited by Dr Ute Roessner

ISBN 978-953-51-0046-1 Hard cover, 364 pages **Publisher** InTech **Published online** 10, February, 2012 **Published in print edition** February, 2012

Metabolomics is a rapidly emerging field in life sciences, which aims to identify and quantify metabolites in a biological system. Analytical chemistry is combined with sophisticated informatics and statistics tools to determine and understand metabolic changes upon genetic or environmental perturbations. Together with other 'omics analyses, such as genomics and proteomics, metabolomics plays an important role in functional genomics and systems biology studies in any biological science. This book will provide the reader with summaries of the state-of-the-art of technologies and methodologies, especially in the data analysis and interpretation approaches, as well as give insights into exciting applications of metabolomics in human health studies, safety assessments, and plant and microbial research.

### **How to reference**

In order to correctly reference this scholarly work, feel free to copy and paste the following:

Gibon Yves, Rolin Dominique, Deborde Catherine, Bernillon Stéphane and Moing Annick (2012). New Opportunities in Metabolomics and Biochemical Phenotyping for Plant Systems Biology, Metabolomics, Dr Ute Roessner (Ed.), ISBN: 978-953-51-0046-1, InTech, Available from:

http://www.intechopen.com/books/metabolomics/new-opportunities-in-metabolomics-and-biochemicalphenotyping-for-plant-systems-biology

### **InTech Europe**

University Campus STeP Ri Slavka Krautzeka 83/A 51000 Rijeka, Croatia Phone: +385 (51) 770 447 Fax: +385 (51) 686 166 www.intechopen.com

#### **InTech China**

Unit 405, Office Block, Hotel Equatorial Shanghai No.65, Yan An Road (West), Shanghai, 200040, China Phone: +86-21-62489820 Fax: +86-21-62489821

© 2012 The Author(s). Licensee IntechOpen. This is an open access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.